ˆ Why language modeling is critical to addressing tasks in natural language processing.
|
|
- Ambrose Kennedy
- 5 years ago
- Views:
Transcription
1 Chapter 17 Neural Language Modeling Language modeling is central to many important natural language processing tasks. Recently, neural-network-based language models have demonstrated better performance than classical methods both standalone and as part of more challenging natural language processing tasks. In this chapter, you will discover language modeling for natural language processing. After reading this chapter, you will know: ˆ Why language modeling is critical to addressing tasks in natural language processing. ˆ What a language model is and some examples of where they are used. ˆ How neural networks can be used for language modeling. Let s get started Overview This tutorial is divided into the following parts: 1. Problem of Modeling Language 2. Statistical Language Modeling 3. Neural Language Models 17.2 Problem of Modeling Language Formal languages, like programming languages, can be fully specified. All the reserved words can be defined and the valid ways that they can be used can be precisely defined. We cannot do this with natural language. Natural languages are not designed; they emerge, and therefore there is no formal specification. There may be formal rules and heuristics for parts of the language, but as soon as rules are defined, you will devise or encounter counter examples that contradict the rules. Natural languages involve vast numbers of terms that can be used in ways that introduce all kinds of ambiguities, yet can still be understood by other humans. Further, languages change, word 190
2 17.3. Statistical Language Modeling 191 usages change: it is a moving target. Nevertheless, linguists try to specify the language with formal grammars and structures. It can be done, but it is very difficult and the results can be fragile. An alternative approach to specifying the model of the language is to learn it from examples Statistical Language Modeling Statistical Language Modeling, or Language Modeling and LM for short, is the development of probabilistic models that are able to predict the next word in the sequence given the words that precede it. Language modeling is the task of assigning a probability to sentences in a language. [...] Besides assigning a probability to each sequence of words, the language models also assigns a probability for the likelihood of a given word (or a sequence of words) to follow a sequence of words Page 105, Neural Network Methods in Natural Language Processing, A language model learns the probability of word occurrence based on examples of text. Simpler models may look at a context of a short sequence of words, whereas larger models may work at the level of sentences or paragraphs. Most commonly, language models operate at the level of words. The notion of a language model is inherently probabilistic. A language model is a function that puts a probability measure over strings drawn from some vocabulary. Page 238, An Introduction to Information Retrieval, A language model can be developed and used standalone, such as to generate new sequences of text that appear to have come from the corpus. Language modeling is a root problem for a large range of natural language processing tasks. More practically, language models are used on the front-end or back-end of a more sophisticated model for a task that requires language understanding.... language modeling is a crucial component in real-world applications such as machine-translation and automatic speech recognition, [...] For these reasons, language modeling plays a central role in natural-language processing, AI, and machinelearning research. Page 105, Neural Network Methods in Natural Language Processing, A good example is speech recognition, where audio data is used as an input to the model and the output requires a language model that interprets the input signal and recognizes each new word within the context of the words already recognized. Speech recognition is principally concerned with the problem of transcribing the speech signal as a sequence of words. [...] From this point of view, speech is assumed to be a generated by a language model which provides estimates of Pr(w) for all word strings w independently of the observed signal [...] The goal of speech recognition is to find the most likely word sequence given the observed acoustic signal.
3 17.4. Neural Language Models 192 Pages , The Oxford Handbook of Computational Linguistics, 2005 Similarly, language models are used to generate text in many similar natural language processing tasks, for example: ˆ Optical Character Recognition ˆ Handwriting Recognition. ˆ Machine Translation. ˆ Spelling Correction. ˆ Image Captioning. ˆ Text Summarization ˆ And much more. Language modeling is the art of determining the probability of a sequence of words. This is useful in a large variety of areas including speech recognition, optical character recognition, handwriting recognition, machine translation, and spelling correction A Bit of Progress in Language Modeling, Developing better language models often results in models that perform better on their intended natural language processing task. This is the motivation for developing better and more accurate language models. [language models] have played a key role in traditional NLP tasks such as speech recognition, machine translation, or text summarization. Often (although not always), training better language models improves the underlying metrics of the downstream task (such as word error rate for speech recognition, or BLEU score for translation), which makes the task of training better LMs valuable by itself. Exploring the Limits of Language Modeling, Neural Language Models Recently, the use of neural networks in the development of language models has become very popular, to the point that it may now be the preferred approach. The use of neural networks in language modeling is often called Neural Language Modeling, or NLM for short. Neural network approaches are achieving better results than classical methods both on standalone language models and when models are incorporated into larger models on challenging tasks like speech recognition and machine translation. A key reason for the leaps in improved performance may be the method s ability to generalize.
4 17.4. Neural Language Models 193 Nonlinear neural network models solve some of the shortcomings of traditional language models: they allow conditioning on increasingly large context sizes with only a linear increase in the number of parameters, they alleviate the need for manually designing backoff orders, and they support generalization across different contexts. Page 109, Neural Network Methods in Natural Language Processing, Specifically, a word embedding is adopted that uses a real-valued vector to represent each word in a projected vector space. This learned representation of words based on their usage allows words with a similar meaning to have a similar representation. Neural Language Models (NLM) address the n-gram data sparsity issue through parameterization of words as vectors (word embeddings) and using them as inputs to a neural network. The parameters are learned as part of the training process. Word embeddings obtained through NLMs exhibit the property whereby semantically close words are likewise close in the induced vector space. Character-Aware Neural Language Model, This generalization is something that the representation used in classical statistical language models cannot easily achieve. True generalization is difficult to obtain in a discrete word indice space, since there is no obvious relation between the word indices. Connectionist language modeling for large vocabulary continuous speech recognition, Further, the distributed representation approach allows the embedding representation to scale better with the size of the vocabulary. Classical methods that have one discrete representation per word fight the curse of dimensionality with larger and larger vocabularies of words that result in longer and more sparse representations. The neural network approach to language modeling can be described using the three following model properties, taken from A Neural Probabilistic Language Model, Associate each word in the vocabulary with a distributed word feature vector. 2. Express the joint probability function of word sequences in terms of the feature vectors of these words in the sequence. 3. Learn simultaneously the word feature vector and the parameters of the probability function. This represents a relatively simple model where both the representation and probabilistic model are learned together directly from raw text data. Recently, the neural based approaches have started to outperform the classical statistical approaches. We provide ample empirical evidence to suggest that connectionist language models are superior to standard n-gram techniques, except their high computational (training) complexity.
5 17.5. Further Reading 194 Recurrent neural network based language model, Initially, feedforward neural network models were used to introduce the approach. More recently, recurrent neural networks and then networks with a long-term memory like the Long Short-Term Memory network, or LSTM, allow the models to learn the relevant context over much longer input sequences than the simpler feedforward networks. [an RNN language model] provides further generalization: instead of considering just several preceding words, neurons with input from recurrent connections are assumed to represent short term memory. The model learns itself from the data how to represent memory. While shallow feedforward neural networks (those with just one hidden layer) can only cluster similar words, recurrent neural network (which can be considered as a deep architecture) can perform clustering of similar histories. This allows for instance efficient representation of patterns with variable length. Extensions of recurrent neural network language model, Recently, researchers have been seeking the limits of these language models. In the paper Exploring the Limits of Language Modeling, evaluating language models over large datasets, such as the corpus of one million words, the authors find that LSTM-based neural language models out-perform the classical methods.... we have shown that RNN LMs can be trained on large amounts of data, and outperform competing models including carefully tuned N-grams. Exploring the Limits of Language Modeling, Further, they propose some heuristics for developing high-performing neural language models in general: ˆ Size matters. The best models were the largest models, specifically number of memory units. ˆ Regularization matters. improves results. Use of regularization like dropout on input connections ˆ CNNs vs Embeddings. Character-level Convolutional Neural Network (CNN) models can be used on the front-end instead of word embeddings, achieving similar and sometimes better results. ˆ Ensembles matter. Combining the prediction from multiple models can offer large improvements in model performance Further Reading This section provides more resources on the topic if you are looking go deeper.
6 17.5. Further Reading Books ˆ Neural Network Methods in Natural Language Processing, ˆ Natural Language Processing, Artificial Intelligence A Modern Approach, ˆ Language models for information retrieval, An Introduction to Information Retrieval, Papers ˆ A Neural Probabilistic Language Model, NIPS, ˆ A Neural Probabilistic Language Model, JMLR, ˆ Connectionist language modeling for large vocabulary continuous speech recognition, pdf ˆ Recurrent neural network based language model, IS pdf ˆ Extensions of recurrent neural network language model, ˆ Character-Aware Neural Language Model, ˆ LSTM Neural Networks for Language Modeling, pdf ˆ Exploring the Limits of Language Modeling, Articles ˆ Language Model, Wikipedia. ˆ Neural net language models, Scholarpedia.
7 17.6. Summary Summary In this chapter, you discovered language modeling for natural language processing tasks. Specifically, you learned: ˆ That natural language is not formally specified and requires the use of statistical models to learn from examples. ˆ That statistical language models are central to many challenging natural language processing tasks. ˆ That state-of-the-art results are achieved using neural language models, specifically those with word embeddings and recurrent neural network algorithms Next In the next chapter, you will discover how you can develop a character-based neural language model.
8 Chapter 18 How to Develop a Character-Based Neural Language Model A language model predicts the next word in the sequence based on the specific words that have come before it in the sequence. It is also possible to develop language models at the character level using neural networks. The benefit of character-based language models is their small vocabulary and flexibility in handling any words, punctuation, and other document structure. This comes at the cost of requiring larger models that are slower to train. Nevertheless, in the field of neural language models, character-based models offer a lot of promise for a general, flexible and powerful approach to language modeling. In this tutorial, you will discover how to develop a character-based neural language model. After completing this tutorial, you will know: ˆ How to prepare text for character-based language modeling. ˆ How to develop a character-based language model using LSTMs. ˆ How to use a trained character-based language model to generate text. Let s get started Tutorial Overview This tutorial is divided into the following parts: 1. Sing a Song of Sixpence 2. Data Preparation 3. Train Language Model 4. Generate Text 197
9 18.2. Sing a Song of Sixpence Sing a Song of Sixpence The nursery rhyme Sing a Song of Sixpence is well known in the west. The first verse is common, but there is also a 4 verse version that we will use to develop our character-based language model. It is short, so fitting the model will be fast, but not so short that we won t see anything interesting. The complete 4 verse version we will use as source text is listed below. Sing a song of sixpence, A pocket full of rye. Four and twenty blackbirds, Baked in a pie. When the pie was opened The birds began to sing; Wasn't that a dainty dish, To set before the king. The king was in his counting house, Counting out his money; The queen was in the parlour, Eating bread and honey. The maid was in the garden, Hanging out the clothes, When down came a blackbird And pecked off her nose. Listing 18.1: Sing a Song of Sixpence nursery rhyme. Copy the text and save it in a new file in your current working directory with the file name rhyme.txt Data Preparation The first step is to prepare the text data. We will start by defining the type of language model Language Model Design A language model must be trained on the text, and in the case of a character-based language model, the input and output sequences must be characters. The number of characters used as input will also define the number of characters that will need to be provided to the model in order to elicit the first predicted character. After the first character has been generated, it can be appended to the input sequence and used as input for the model to generate the next character. Longer sequences offer more context for the model to learn what character to output next but take longer to train and impose more burden on seeding the model when generating text. We will use an arbitrary length of 10 characters for this model. There is not a lot of text, and 10 characters is a few words. We can now transform the raw text into a form that our model can learn; specifically, input and output sequences of characters.
10 18.3. Data Preparation Load Text We must load the text into memory so that we can work with it. Below is a function named load doc() that will load a text file given a filename and return the loaded text. # load doc into memory def load_doc(filename): # open the file as read only file = open(filename, 'r') # read all text text = file.read() # close the file file.close() return text Listing 18.2: Function to load a document into memory. We can call this function with the filename of the nursery rhyme rhyme.txt to load the text into memory. The contents of the file are then printed to screen as a sanity check. # load text raw_text = load_doc('rhyme.txt') print(raw_text) Listing 18.3: Load the document into memory Clean Text Next, we need to clean the loaded text. We will not do much to it on this example. Specifically, we will strip all of the new line characters so that we have one long sequence of characters separated only by white space. # clean tokens = raw_text.split() raw_text = ' '.join(tokens) Listing 18.4: Tokenize the loaded document. You may want to explore other methods for data cleaning, such as normalizing the case to lowercase or removing punctuation in an effort to reduce the final vocabulary size and develop a smaller and leaner model Create Sequences Now that we have a long list of characters, we can create our input-output sequences used to train the model. Each input sequence will be 10 characters with one output character, making each sequence 11 characters long. We can create the sequences by enumerating the characters in the text, starting at the 11th character at index 10. # organize into sequences of characters length = 10 sequences = list() for i in range(length, len(raw_text)): # select sequence of tokens seq = raw_text[i-length:i+1]
11 18.3. Data Preparation 200 # store sequences.append(seq) print('total Sequences: %d' % len(sequences)) Listing 18.5: Convert text into fixed-length sequences. Running this snippet, we can see that we end up with just under 400 sequences of characters for training our language model. Total Sequences: 399 Listing 18.6: Example output of converting text into fixed-length sequences Save Sequences Finally, we can save the prepared data to file so that we can load it later when we develop our model. Below is a function save doc() that, given a list of strings and a filename, will save the strings to file, one per line. # save tokens to file, one dialog per line def save_doc(lines, filename): data = '\n'.join(lines) file = open(filename, 'w') file.write(data) file.close() Listing 18.7: Function to save sequences to file. We can call this function and save our prepared sequences to the filename char sequences.txt in our current working directory. # save sequences to file out_filename = 'char_sequences.txt' save_doc(sequences, out_filename) Listing 18.8: Call function to save sequences to file Complete Example Tying all of this together, the complete code listing is provided below. # load doc into memory def load_doc(filename): # open the file as read only file = open(filename, 'r') # read all text text = file.read() # close the file file.close() return text # save tokens to file, one dialog per line def save_doc(lines, filename): data = '\n'.join(lines) file = open(filename, 'w')
12 18.4. Train Language Model 201 file.write(data) file.close() # load text raw_text = load_doc('rhyme.txt') print(raw_text) # clean tokens = raw_text.split() raw_text = ' '.join(tokens) # organize into sequences of characters length = 10 sequences = list() for i in range(length, len(raw_text)): # select sequence of tokens seq = raw_text[i-length:i+1] # store sequences.append(seq) print('total Sequences: %d' % len(sequences)) # save sequences to file out_filename = 'char_sequences.txt' save_doc(sequences, out_filename) Listing 18.9: Complete example of preparing the text data. Run the example to create the char sequences.txt file. Take a look inside you should see something like the following: Sing a song ing a song ng a song o g a song of a song of a song of s song of si song of six ong of sixp ng of sixpe... Listing 18.10: Sample of the output file. We are now ready to train our character-based neural language model Train Language Model In this section, we will develop a neural language model for the prepared sequence data. The model will read encoded characters and predict the next character in the sequence. A Long Short-Term Memory recurrent neural network hidden layer will be used to learn the context from the input sequence in order to make the predictions Load Data The first step is to load the prepared character sequence data from char sequences.txt. We can use the same load doc() function developed in the previous section. Once loaded, we split
13 18.4. Train Language Model 202 the text by new line to give a list of sequences ready to be encoded. # load doc into memory def load_doc(filename): # open the file as read only file = open(filename, 'r') # read all text text = file.read() # close the file file.close() return text # load in_filename = 'char_sequences.txt' raw_text = load_doc(in_filename) lines = raw_text.split('\n') Listing 18.11: Load the prepared text data Encode Sequences The sequences of characters must be encoded as integers. This means that each unique character will be assigned a specific integer value and each sequence of characters will be encoded as a sequence of integers. We can create the mapping given a sorted set of unique characters in the raw input data. The mapping is a dictionary of character values to integer values. chars = sorted(list(set(raw_text))) mapping = dict((c, i) for i, c in enumerate(chars)) Listing 18.12: Create a mapping between chars and integers. Next, we can process each sequence of characters one at a time and use the dictionary mapping to look up the integer value for each character. sequences = list() for line in lines: # integer encode line encoded_seq = [mapping[char] for char in line] # store sequences.append(encoded_seq) Listing 18.13: Integer encode sequences of characters. The result is a list of integer lists. We need to know the size of the vocabulary later. We can retrieve this as the size of the dictionary mapping. # vocabulary size vocab_size = len(mapping) print('vocabulary Size: %d' % vocab_size) Listing 18.14: Summarize the size of the vocabulary. Running this piece, we can see that there are 38 unique characters in the input sequence data. Vocabulary Size: 38 Listing 18.15: Example output from summarizing the size of the vocabulary.
14 18.4. Train Language Model Split Inputs and Output Now that the sequences have been integer encoded, we can separate the columns into input and output sequences of characters. We can do this using a simple array slice. sequences = array(sequences) X, y = sequences[:,:-1], sequences[:,-1] Listing 18.16: Split sequences into input and output elements. Next, we need to one hot encode each character. That is, each character becomes a vector as long as the vocabulary (38 elements) with a 1 marked for the specific character. This provides a more precise input representation for the network. It also provides a clear objective for the network to predict, where a probability distribution over characters can be output by the model and compared to the ideal case of all 0 values with a 1 for the actual next character. We can use the to categorical() function in the Keras API to one hot encode the input and output sequences. sequences = [to_categorical(x, num_classes=vocab_size) for x in X] X = array(sequences) y = to_categorical(y, num_classes=vocab_size) Listing 18.17: Convert sequences into a format ready for training. We are now ready to fit the model Fit Model The model is defined with an input layer that takes sequences that have 10 time steps and 38 features for the one hot encoded input sequences. Rather than specify these numbers, we use the second and third dimensions on the X input data. This is so that if we change the length of the sequences or size of the vocabulary, we do not need to change the model definition. The model has a single LSTM hidden layer with 75 memory cells, chosen with a little trial and error. The model has a fully connected output layer that outputs one vector with a probability distribution across all characters in the vocabulary. A softmax activation function is used on the output layer to ensure the output has the properties of a probability distribution. # define the model def define_model(x): model = Sequential() model.add(lstm(75, input_shape=(x.shape[1], X.shape[2]))) model.add(dense(vocab_size, activation='softmax')) # compile model model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) # summarize defined model model.summary() plot_model(model, to_file='model.png', show_shapes=true) return model Listing 18.18: Define the language model. The model is learning a multiclass classification problem, therefore we use the categorical log loss intended for this type of problem. The efficient Adam implementation of gradient descent is used to optimize the model and accuracy is reported at the end of each batch update. The
15 18.4. Train Language Model 204 model is fit for 100 training epochs, again found with a little trial and error. Running this prints a summary of the defined network as a sanity check. Layer (type) Output Shape Param # ================================================================= lstm_1 (LSTM) (None, 75) dense_1 (Dense) (None, 38) 2888 ================================================================= Total params: 37,088 Trainable params: 37,088 Non-trainable params: 0 Listing 18.19: Example output from summarizing the defined model. A plot the defined model is then saved to file with the name model.png. Figure 18.1: Plot of the defined character-based language model Save Model After the model is fit, we save it to file for later use. The Keras model API provides the save() function that we can use to save the model to a single file, including weights and topology information. # save the model to file model.save('model.h5') Listing 18.20: Save the fit model to file. We also save the mapping from characters to integers that we will need to encode any input when using the model and decode any output from the model. # save the mapping dump(mapping, open('mapping.pkl', 'wb')) Listing 18.21: Save the mapping of chars to integers to file.
16 18.4. Train Language Model Complete Example Tying all of this together, the complete code listing for fitting the character-based neural language model is listed below. from numpy import array from pickle import dump from keras.utils import to_categorical from keras.utils.vis_utils import plot_model from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM # load doc into memory def load_doc(filename): # open the file as read only file = open(filename, 'r') # read all text text = file.read() # close the file file.close() return text # define the model def define_model(x): model = Sequential() model.add(lstm(75, input_shape=(x.shape[1], X.shape[2]))) model.add(dense(vocab_size, activation='softmax')) # compile model model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) # summarize defined model model.summary() plot_model(model, to_file='model.png', show_shapes=true) return model # load in_filename = 'char_sequences.txt' raw_text = load_doc(in_filename) lines = raw_text.split('\n') # integer encode sequences of characters chars = sorted(list(set(raw_text))) mapping = dict((c, i) for i, c in enumerate(chars)) sequences = list() for line in lines: # integer encode line encoded_seq = [mapping[char] for char in line] # store sequences.append(encoded_seq) # vocabulary size vocab_size = len(mapping) print('vocabulary Size: %d' % vocab_size) # separate into input and output sequences = array(sequences) X, y = sequences[:,:-1], sequences[:,-1] sequences = [to_categorical(x, num_classes=vocab_size) for x in X] X = array(sequences)
17 18.5. Generate Text 206 y = to_categorical(y, num_classes=vocab_size) # define model model = define_model(x) # fit model model.fit(x, y, epochs=100, verbose=2) # save the model to file model.save('model.h5') # save the mapping dump(mapping, open('mapping.pkl', 'wb')) Listing 18.22: Complete example of training the language model. Running the example might take one minute. You will see that the model learns the problem well, perhaps too well for generating surprising sequences of characters.... Epoch 96/100 0s - loss: acc: Epoch 97/100 0s - loss: acc: Epoch 98/100 0s - loss: acc: Epoch 99/100 0s - loss: acc: Epoch 100/100 0s - loss: acc: Listing 18.23: Example output from training the language model. At the end of the run, you will have two files saved to the current working directory, specifically model.h5 and mapping.pkl. Next, we can look at using the learned model Generate Text We will use the learned language model to generate new sequences of text that have the same statistical properties Load Model The first step is to load the model saved to the file model.h5. We can use the load model() function from the Keras API. # load the model model = load_model('model.h5') Listing 18.24: Load the saved model. We also need to load the pickled dictionary for mapping characters to integers from the file mapping.pkl. We will use the Pickle API to load the object. # load the mapping mapping = load(open('mapping.pkl', 'rb')) Listing 18.25: Load the saved mapping from chars to integers. We are now ready to use the loaded model.
18 18.5. Generate Text Generate Characters We must provide sequences of 10 characters as input to the model in order to start the generation process. We will pick these manually. A given input sequence will need to be prepared in the same way as preparing the training data for the model. First, the sequence of characters must be integer encoded using the loaded mapping. # encode the characters as integers encoded = [mapping[char] for char in in_text] Listing 18.26: Encode input text to integers. Next, the integers need to be one hot encoded using the to categorical() Keras function. We also need to reshape the sequence to be 3-dimensional, as we only have one sequence and LSTMs require all input to be three dimensional (samples, time steps, features). # one hot encode encoded = to_categorical(encoded, num_classes=len(mapping)) encoded = encoded.reshape(1, encoded.shape[0], encoded.shape[1]) Listing 18.27: One hot encode the integer encoded text. We can then use the model to predict the next character in the sequence. We use predict classes() instead of predict() to directly select the integer for the character with the highest probability instead of getting the full probability distribution across the entire set of characters. # predict character yhat = model.predict_classes(encoded, verbose=0) Listing 18.28: Predict the next character in the sequence. We can then decode this integer by looking up the mapping to see the character to which it maps. out_char = '' for char, index in mapping.items(): if index == yhat: out_char = char break Listing 18.29: Map the predicted integer back to a character. This character can then be added to the input sequence. We then need to make sure that the input sequence is 10 characters by truncating the first character from the input sequence text. We can use the pad sequences() function from the Keras API that can perform this truncation operation. Putting all of this together, we can define a new function named generate seq() for using the loaded model to generate new sequences of text. # generate a sequence of characters with a language model def generate_seq(model, mapping, seq_length, seed_text, n_chars): in_text = seed_text # generate a fixed number of characters for _ in range(n_chars): # encode the characters as integers encoded = [mapping[char] for char in in_text] # truncate sequences to a fixed length
19 18.5. Generate Text 208 encoded = pad_sequences([encoded], maxlen=seq_length, truncating='pre') # one hot encode encoded = to_categorical(encoded, num_classes=len(mapping)) encoded = encoded.reshape(1, encoded.shape[0], encoded.shape[1]) # predict character yhat = model.predict_classes(encoded, verbose=0) # reverse map integer to character out_char = '' for char, index in mapping.items(): if index == yhat: out_char = char break # append to input in_text += char return in_text Listing 18.30: Function to predict a sequence of characters given seed text Complete Example Tying all of this together, the complete example for generating text using the fit neural language model is listed below. from pickle import load from keras.models import load_model from keras.utils import to_categorical from keras.preprocessing.sequence import pad_sequences # generate a sequence of characters with a language model def generate_seq(model, mapping, seq_length, seed_text, n_chars): in_text = seed_text # generate a fixed number of characters for _ in range(n_chars): # encode the characters as integers encoded = [mapping[char] for char in in_text] # truncate sequences to a fixed length encoded = pad_sequences([encoded], maxlen=seq_length, truncating='pre') # one hot encode encoded = to_categorical(encoded, num_classes=len(mapping)) encoded = encoded.reshape(1, encoded.shape[0], encoded.shape[1]) # predict character yhat = model.predict_classes(encoded, verbose=0) # reverse map integer to character out_char = '' for char, index in mapping.items(): if index == yhat: out_char = char break # append to input in_text += out_char return in_text # load the model model = load_model('model.h5') # load the mapping
20 18.6. Further Reading 209 mapping = load(open('mapping.pkl', 'rb')) # test start of rhyme print(generate_seq(model, mapping, 10, 'Sing a son', 20)) # test mid-line print(generate_seq(model, mapping, 10, 'king was i', 20)) # test not in original print(generate_seq(model, mapping, 10, 'hello worl', 20)) Listing 18.31: Complete example of generating characters with the fit model. Running the example generates three sequences of text. The first is a test to see how the model does at starting from the beginning of the rhyme. The second is a test to see how well it does at beginning in the middle of a line. The final example is a test to see how well it does with a sequence of characters never seen before. Note: Given the stochastic nature of neural networks, your specific results may vary. Consider running the example a few times. Sing a song of sixpence, A poc king was in his counting house hello worls e pake wofey. The Listing 18.32: Example output from generating sequences of characters. We can see that the model did very well with the first two examples, as we would expect. We can also see that the model still generated something for the new text, but it is nonsense Further Reading This section provides more resources on the topic if you are looking go deeper. ˆ Sing a Song of Sixpence on Wikipedia. ˆ Keras Utils API. ˆ Keras Sequence Processing API Summary In this tutorial, you discovered how to develop a character-based neural language model. Specifically, you learned: ˆ How to prepare text for character-based language modeling. ˆ How to develop a character-based language model using LSTMs. ˆ How to use a trained character-based language model to generate text.
21 18.7. Summary Next In the next chapter, you will discover how you can develop a word-based neural language model.
22 Chapter 19 How to Develop a Word-Based Neural Language Model Language modeling involves predicting the next word in a sequence given the sequence of words already present. A language model is a key element in many natural language processing models such as machine translation and speech recognition. The choice of how the language model is framed must match how the language model is intended to be used. In this tutorial, you will discover how the framing of a language model affects the skill of the model when generating short sequences from a nursery rhyme. After completing this tutorial, you will know: ˆ The challenge of developing a good framing of a word-based language model for a given application. ˆ How to develop one-word, two-word, and line-based framings for word-based language models. ˆ How to generate sequences using a fit language model. Let s get started Tutorial Overview This tutorial is divided into the following parts: 1. Framing Language Modeling 2. Jack and Jill Nursery Rhyme 3. Model 1: One-Word-In, One-Word-Out Sequences 4. Model 2: Line-by-Line Sequence 5. Model 3: Two-Words-In, One-Word-Out Sequence 211
23 19.2. Framing Language Modeling Framing Language Modeling A statistical language model is learned from raw text and predicts the probability of the next word in the sequence given the words already present in the sequence. Language models are a key component in larger models for challenging natural language processing problems, like machine translation and speech recognition. They can also be developed as standalone models and used for generating new sequences that have the same statistical properties as the source text. Language models both learn and predict one word at a time. The training of the network involves providing sequences of words as input that are processed one at a time where a prediction can be made and learned for each input sequence. Similarly, when making predictions, the process can be seeded with one or a few words, then predicted words can be gathered and presented as input on subsequent predictions in order to build up a generated output sequence Therefore, each model will involve splitting the source text into input and output sequences, such that the model can learn to predict words. There are many ways to frame the sequences from a source text for language modeling. In this tutorial, we will explore 3 different ways of developing word-based language models in the Keras deep learning library. There is no single best approach, just different framings that may suit different applications Jack and Jill Nursery Rhyme Jack and Jill is a simple nursery rhyme. It is comprised of 4 lines, as follows: Jack and Jill went up the hill To fetch a pail of water Jack fell down and broke his crown And Jill came tumbling after Listing 19.1: Jack and Jill nursery rhyme. We will use this as our source text for exploring different framings of a word-based language model. We can define this text in Python as follows: # source text data = """ Jack and Jill went up the hill\n To fetch a pail of water\n Jack fell down and broke his crown\n And Jill came tumbling after\n """ Listing 19.2: Sample text for this tutorial Model 1: One-Word-In, One-Word-Out Sequences We can start with a very simple model. Given one word as input, the model will learn to predict the next word in the sequence. For example: X, y Jack, and and, Jill Jill, went
24 19.4. Model 1: One-Word-In, One-Word-Out Sequences Listing 19.3: Example of input and output pairs. The first step is to encode the text as integers. Each lowercase word in the source text is assigned a unique integer and we can convert the sequences of words to sequences of integers. Keras provides the Tokenizer class that can be used to perform this encoding. First, the Tokenizer is fit on the source text to develop the mapping from words to unique integers. Then sequences of text can be converted to sequences of integers by calling the texts to sequences() function. # integer encode text tokenizer = Tokenizer() tokenizer.fit_on_texts([data]) encoded = tokenizer.texts_to_sequences([data])[0] Listing 19.4: Example of training a Tokenizer on the sample text. We will need to know the size of the vocabulary later for both defining the word embedding layer in the model, and for encoding output words using a one hot encoding. The size of the vocabulary can be retrieved from the trained Tokenizer by accessing the word index attribute. # determine the vocabulary size vocab_size = len(tokenizer.word_index) + 1 print('vocabulary Size: %d' % vocab_size) Listing 19.5: Summarize the size of the vocabulary. Running this example, we can see that the size of the vocabulary is 21 words. We add one, because we will need to specify the integer for the largest encoded word as an array index, e.g. words encoded 1 to 21 with array indicies 0 to 21 or 22 positions. Next, we need to create sequences of words to fit the model with one word as input and one word as output. # create word -> word sequences sequences = list() for i in range(1, len(encoded)): sequence = encoded[i-1:i+1] sequences.append(sequence) print('total Sequences: %d' % len(sequences)) Listing 19.6: Example of encoding the source text. Running this piece shows that we have a total of 24 input-output pairs to train the network. Total Sequences: 24 Listing 19.7: Example of output of summarizing the encoded text. We can then split the sequences into input (X) and output elements (y). This is straightforward as we only have two columns in the data. # split into X and y elements sequences = array(sequences) X, y = sequences[:,0],sequences[:,1] Listing 19.8: Split the encoded text into input and output pairs.
25 19.4. Model 1: One-Word-In, One-Word-Out Sequences 214 We will fit our model to predict a probability distribution across all words in the vocabulary. That means that we need to turn the output element from a single integer into a one hot encoding with a 0 for every word in the vocabulary and a 1 for the actual word that the value. This gives the network a ground truth to aim for from which we can calculate error and update the model. Keras provides the to categorical() function that we can use to convert the integer to a one hot encoding while specifying the number of classes as the vocabulary size. # one hot encode outputs y = to_categorical(y, num_classes=vocab_size) Listing 19.9: One hot encode the output words. We are now ready to define the neural network model. The model uses a learned word embedding in the input layer. This has one real-valued vector for each word in the vocabulary, where each word vector has a specified length. In this case we will use a 10-dimensional projection. The input sequence contains a single word, therefore the input length=1. The model has a single hidden LSTM layer with 50 units. This is far more than is needed. The output layer is comprised of one neuron for each word in the vocabulary and uses a softmax activation function to ensure the output is normalized to look like a probability. # define the model def define_model(vocab_size): model = Sequential() model.add(embedding(vocab_size, 10, input_length=1)) model.add(lstm(50)) model.add(dense(vocab_size, activation='softmax')) # compile network model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) # summarize defined model model.summary() plot_model(model, to_file='model.png', show_shapes=true) return model Listing 19.10: Define and compile the language model. The structure of the network can be summarized as follows: Layer (type) Output Shape Param # ================================================================= embedding_1 (Embedding) (None, 1, 10) 220 lstm_1 (LSTM) (None, 50) dense_1 (Dense) (None, 22) 1122 ================================================================= Total params: 13,542 Trainable params: 13,542 Non-trainable params: 0 Listing 19.11: Example output summarizing the defined model. A plot the defined model is then saved to file with the name model.png.
26 19.4. Model 1: One-Word-In, One-Word-Out Sequences 215 Figure 19.1: Plot of the defined word-based language model. We will use this same general network structure for each example in this tutorial, with minor changes to the learned embedding layer. We can compile and fit the network on the encoded text data. Technically, we are modeling a multiclass classification problem (predict the word in the vocabulary), therefore using the categorical cross entropy loss function. We use the efficient Adam implementation of gradient descent and track accuracy at the end of each epoch. The model is fit for 500 training epochs, again, perhaps more than is needed. The network configuration was not tuned for this and later experiments; an over-prescribed configuration was chosen to ensure that we could focus on the framing of the language model. After the model is fit, we test it by passing it a given word from the vocabulary and having the model predict the next word. Here we pass in Jack by encoding it and calling model.predict classes() to get the integer output for the predicted word. This is then looked up in the vocabulary mapping to give the associated word. # evaluate in_text = 'Jack' print(in_text) encoded = tokenizer.texts_to_sequences([in_text])[0] encoded = array(encoded) yhat = model.predict_classes(encoded, verbose=0) for word, index in tokenizer.word_index.items(): if index == yhat: print(word) Listing 19.12: Evaluate the fit language model. This process could then be repeated a few times to build up a generated sequence of words. To make this easier, we wrap up the behavior in a function that we can call by passing in our model and the seed word. # generate a sequence from the model def generate_seq(model, tokenizer, seed_text, n_words):
27 19.4. Model 1: One-Word-In, One-Word-Out Sequences 216 in_text, result = seed_text, seed_text # generate a fixed number of words for _ in range(n_words): # encode the text as integer encoded = tokenizer.texts_to_sequences([in_text])[0] encoded = array(encoded) # predict a word in the vocabulary yhat = model.predict_classes(encoded, verbose=0) # map predicted word index to word out_word = '' for word, index in tokenizer.word_index.items(): if index == yhat: out_word = word break # append to input in_text, result = out_word, result + ' ' + out_word return result Listing 19.13: Function to generate output sequences given a fit model. We can tie all of this together. The complete code listing is provided below. from numpy import array from keras.preprocessing.text import Tokenizer from keras.utils import to_categorical from keras.utils.vis_utils import plot_model from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.layers import Embedding # generate a sequence from the model def generate_seq(model, tokenizer, seed_text, n_words): in_text, result = seed_text, seed_text # generate a fixed number of words for _ in range(n_words): # encode the text as integer encoded = tokenizer.texts_to_sequences([in_text])[0] encoded = array(encoded) # predict a word in the vocabulary yhat = model.predict_classes(encoded, verbose=0) # map predicted word index to word out_word = '' for word, index in tokenizer.word_index.items(): if index == yhat: out_word = word break # append to input in_text, result = out_word, result + ' ' + out_word return result # define the model def define_model(vocab_size): model = Sequential() model.add(embedding(vocab_size, 10, input_length=1)) model.add(lstm(50)) model.add(dense(vocab_size, activation='softmax'))
28 19.4. Model 1: One-Word-In, One-Word-Out Sequences 217 # compile network model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) # summarize defined model model.summary() plot_model(model, to_file='model.png', show_shapes=true) return model # source text data = """ Jack and Jill went up the hill\n To fetch a pail of water\n Jack fell down and broke his crown\n And Jill came tumbling after\n """ # integer encode text tokenizer = Tokenizer() tokenizer.fit_on_texts([data]) encoded = tokenizer.texts_to_sequences([data])[0] # determine the vocabulary size vocab_size = len(tokenizer.word_index) + 1 print('vocabulary Size: %d' % vocab_size) # create word -> word sequences sequences = list() for i in range(1, len(encoded)): sequence = encoded[i-1:i+1] sequences.append(sequence) print('total Sequences: %d' % len(sequences)) # split into X and y elements sequences = array(sequences) X, y = sequences[:,0],sequences[:,1] # one hot encode outputs y = to_categorical(y, num_classes=vocab_size) # define model model = define_model(vocab_size) # fit network model.fit(x, y, epochs=500, verbose=2) # evaluate print(generate_seq(model, tokenizer, 'Jack', 6)) Listing 19.14: Complete example of model1. Running the example prints the loss and accuracy each training epoch.... Epoch 496/500 0s - loss: acc: Epoch 497/500 0s - loss: acc: Epoch 498/500 0s - loss: acc: Epoch 499/500 0s - loss: acc: Epoch 500/500 0s - loss: acc: Listing 19.15: Example output of fitting the language model. We can see that the model does not memorize the source sequences, likely because there is some ambiguity in the input sequences, for example:
29 19.5. Model 2: Line-by-Line Sequence 218 jack => and jack => fell Listing 19.16: Example output of predicting the next word. And so on. At the end of the run, Jack is passed in and a prediction or new sequence is generated. We get a reasonable sequence as output that has some elements of the source. Note: Given the stochastic nature of neural networks, your specific results may vary. Consider running the example a few times. Jack and jill came tumbling after down Listing 19.17: Example output of predicting a sequence of words. This is a good first cut language model, but does not take full advantage of the LSTM s ability to handle sequences of input and disambiguate some of the ambiguous pairwise sequences by using a broader context Model 2: Line-by-Line Sequence Another approach is to split up the source text line-by-line, then break each line down into a series of words that build up. For example: X, y _, _, _, _, _, Jack, and _, _, _, _, Jack, and, Jill _, _, _, Jack, and, Jill, went _, _, Jack, and, Jill, went, up _, Jack, and, Jill, went, up, the Jack, and, Jill, went, up, the, hill Listing 19.18: Example framing of the problem as sequences of words. This approach may allow the model to use the context of each line to help the model in those cases where a simple one-word-in-and-out model creates ambiguity. In this case, this comes at the cost of predicting words across lines, which might be fine for now if we are only interested in modeling and generating lines of text. Note that in this representation, we will require a padding of sequences to ensure they meet a fixed length input. This is a requirement when using Keras. First, we can create the sequences of integers, line-by-line by using the Tokenizer already fit on the source text. # create line-based sequences sequences = list() for line in data.split('\n'): encoded = tokenizer.texts_to_sequences([line])[0] for i in range(1, len(encoded)): sequence = encoded[:i+1] sequences.append(sequence) print('total Sequences: %d' % len(sequences)) Listing 19.19: Example of preparing sequences of words.
How to Develop Encoder-Decoder LSTMs
Chapter 9 How to Develop Encoder-Decoder LSTMs 9.0.1 Lesson Goal The goal of this lesson is to learn how to develop encoder-decoder LSTM models. completing this lesson, you will know: After ˆ The Encoder-Decoder
More informationˆ How to develop a naive LSTM network for a sequence prediction problem.
Chapter 27 Understanding Stateful LSTM Recurrent Neural Networks A powerful and popular recurrent neural network is the long short-term model network or LSTM. It is widely used because the architecture
More informationˆ The first architecture to try with specific advice on how to configure hyperparameters.
Chapter 14 Neural Models for Document Classification Text classification describes a general class of problems such as predicting the sentiment of tweets and movie reviews, as well as classifying email
More informationDEEP LEARNING IN PYTHON. Creating a keras model
DEEP LEARNING IN PYTHON Creating a keras model Model building steps Specify Architecture Compile Fit Predict Model specification In [1]: import numpy as np In [2]: from keras.layers import Dense In [3]:
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationKeras: Handwritten Digit Recognition using MNIST Dataset
Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA January 31, 2018 1 / 30 OUTLINE 1 Keras: Introduction 2 Installing Keras 3 Keras: Building, Testing, Improving A Simple Network 2 / 30
More informationPractical Deep Learning
Practical Deep Learning micha.codes / fastforwardlabs.com 1 / 70 deep learning can seem mysterious 2 / 70 let's nd a way to just build a function 3 / 70 Feed Forward Layer # X.shape == (512,) # output.shape
More informationCode Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:
Code Mania 2019 Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: 1. Introduction to Artificial Intelligence 2. Introduction to python programming and Environment
More informationApplication of Deep Learning Techniques in Satellite Telemetry Analysis.
Application of Deep Learning Techniques in Satellite Telemetry Analysis. Greg Adamski, Member of Technical Staff L3 Technologies Telemetry and RF Products Julian Spencer Jones, Spacecraft Engineer Telenor
More informationDeep Nets with. Keras
docs https://keras.io Deep Nets with Keras κέρας http://vem.quantumunlimited.org/the-gates-of-horn/ Professor Marie Roch These slides only cover enough to get started with feed-forward networks and do
More informationLSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia
1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model
More informationKeras: Handwritten Digit Recognition using MNIST Dataset
Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA February 9, 2017 1 / 24 OUTLINE 1 Introduction Keras: Deep Learning library for Theano and TensorFlow 2 Installing Keras Installation
More informationEncoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44
A Activation potential, 40 Annotated corpus add padding, 162 check versions, 158 create checkpoints, 164, 166 create input, 160 create train and validation datasets, 163 dropout, 163 DRUG-AE.rel file,
More informationA Quick Guide on Training a neural network using Keras.
A Quick Guide on Training a neural network using Keras. TensorFlow and Keras Keras Open source High level, less flexible Easy to learn Perfect for quick implementations Starts by François Chollet from
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training
More informationTutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY
Tutorial on Keras CAP 6412 - ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Deep learning packages TensorFlow Google PyTorch Facebook AI research Keras Francois Chollet (now at Google) Chainer Company
More informationDeep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur
Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Lecture - 05 Classification with Perceptron Model So, welcome to today
More informationLecture 20: Neural Networks for NLP. Zubin Pahuja
Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple
More informationDeep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies
http://blog.csdn.net/zouxy09/article/details/8775360 Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort
More informationNatural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu
Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward
More informationECE 5470 Classification, Machine Learning, and Neural Network Review
ECE 5470 Classification, Machine Learning, and Neural Network Review Due December 1. Solution set Instructions: These questions are to be answered on this document which should be submitted to blackboard
More informationEmpirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Authors: Junyoung Chung, Caglar Gulcehre, KyungHyun Cho and Yoshua Bengio Presenter: Yu-Wei Lin Background: Recurrent Neural
More informationA Neuro Probabilistic Language Model Bengio et. al. 2003
A Neuro Probabilistic Language Model Bengio et. al. 2003 Class Discussion Notes Scribe: Olivia Winn February 1, 2016 Opening thoughts (or why this paper is interesting): Word embeddings currently have
More informationSEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks
More informationMachine Learning for Physicists Lecture 6. Summer 2017 University of Erlangen-Nuremberg Florian Marquardt
Machine Learning for Physicists Lecture 6 Summer 2017 University of Erlangen-Nuremberg Florian Marquardt Channels MxM image MxM image K K 3 channels conv 6 channels in any output channel, each pixel receives
More informationFastText. Jon Koss, Abhishek Jindal
FastText Jon Koss, Abhishek Jindal FastText FastText is on par with state-of-the-art deep learning classifiers in terms of accuracy But it is way faster: FastText can train on more than one billion words
More informationPLT: Inception (cuz there are so many layers)
PLT: Inception (cuz there are so many layers) By: Andrew Aday, (aza2112) Amol Kapoor (ajk2227), Jonathan Zhang (jz2814) Proposal Abstract Overview of domain Purpose Language Outline Types Operators Syntax
More informationIndex. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,
A Acquisition function, 298, 301 Adam optimizer, 175 178 Anaconda navigator conda command, 3 Create button, 5 download and install, 1 installing packages, 8 Jupyter Notebook, 11 13 left navigation pane,
More informationEECS 496 Statistical Language Models. Winter 2018
EECS 496 Statistical Language Models Winter 2018 Introductions Professor: Doug Downey Course web site: www.cs.northwestern.edu/~ddowney/courses/496_winter2018 (linked off prof. home page) Logistics Grading
More informationDeep Learning. Architecture Design for. Sargur N. Srihari
Architecture Design for Deep Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationSequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015
Sequence Modeling: Recurrent and Recursive Nets By Pyry Takala 14 Oct 2015 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, 10.2.1) Properties of RNNs (10.2.2, 8.2.6) Using
More informationDeep Learning Applications
October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning
More informationEmpirical Evaluation of RNN Architectures on Sentence Classification Task
Empirical Evaluation of RNN Architectures on Sentence Classification Task Lei Shen, Junlin Zhang Chanjet Information Technology lorashen@126.com, zhangjlh@chanjet.com Abstract. Recurrent Neural Networks
More informationRecurrent Neural Networks
Recurrent Neural Networks 11-785 / Fall 2018 / Recitation 7 Raphaël Olivier Recap : RNNs are magic They have infinite memory They handle all kinds of series They re the basis of recent NLP : Translation,
More informationEmel: Deep Learning in One Line using Type Inference, Architecture Selection, and Hyperparameter Tuning
Emel: Deep Learning in One Line using Type Inference, Architecture Selection, and Hyperparameter Tuning BARAK OSHRI and NISHITH KHANDWALA We present Emel, a new framework for training baseline supervised
More information27: Hybrid Graphical Models and Neural Networks
10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look
More informationMoonRiver: Deep Neural Network in C++
MoonRiver: Deep Neural Network in C++ Chung-Yi Weng Computer Science & Engineering University of Washington chungyi@cs.washington.edu Abstract Artificial intelligence resurges with its dramatic improvement
More informationDeep Learning and Its Applications
Convolutional Neural Network and Its Application in Image Recognition Oct 28, 2016 Outline 1 A Motivating Example 2 The Convolutional Neural Network (CNN) Model 3 Training the CNN Model 4 Issues and Recent
More informationTutorial on Machine Learning Tools
Tutorial on Machine Learning Tools Yanbing Xue Milos Hauskrecht Why do we need these tools? Widely deployed classical models No need to code from scratch Easy-to-use GUI Outline Matlab Apps Weka 3 UI TensorFlow
More informationSentiment Classification of Food Reviews
Sentiment Classification of Food Reviews Hua Feng Department of Electrical Engineering Stanford University Stanford, CA 94305 fengh15@stanford.edu Ruixi Lin Department of Electrical Engineering Stanford
More informationRecurrent Neural Nets II
Recurrent Neural Nets II Steven Spielberg Pon Kumar, Tingke (Kevin) Shen Machine Learning Reading Group, Fall 2016 9 November, 2016 Outline 1 Introduction 2 Problem Formulations with RNNs 3 LSTM for Optimization
More informationEnd-To-End Spam Classification With Neural Networks
End-To-End Spam Classification With Neural Networks Christopher Lennan, Bastian Naber, Jan Reher, Leon Weber 1 Introduction A few years ago, the majority of the internet s network traffic was due to spam
More informationINFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE
15 : CONCEPT AND SCOPE 15.1 INTRODUCTION Information is communicated or received knowledge concerning a particular fact or circumstance. Retrieval refers to searching through stored information to find
More informationLecture 7: Neural network acoustic models in speech recognition
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 7: Neural network acoustic models in speech recognition Outline Hybrid acoustic modeling overview Basic
More informationMachine Learning Practice and Theory
Machine Learning Practice and Theory Day 9 - Feature Extraction Govind Gopakumar IIT Kanpur 1 Prelude 2 Announcements Programming Tutorial on Ensemble methods, PCA up Lecture slides for usage of Neural
More informationNeural Network Optimization and Tuning / Spring 2018 / Recitation 3
Neural Network Optimization and Tuning 11-785 / Spring 2018 / Recitation 3 1 Logistics You will work through a Jupyter notebook that contains sample and starter code with explanations and comments throughout.
More informationJOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation
JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based
More informationNeural networks. About. Linear function approximation. Spyros Samothrakis Research Fellow, IADS University of Essex.
Neural networks Spyros Samothrakis Research Fellow, IADS University of Essex About Linear function approximation with SGD From linear regression to neural networks Practical aspects February 28, 2017 Conclusion
More informationCharacterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager
Characterization and Benchmarking of Deep Learning Natalia Vassilieva, PhD Sr. Research Manager Deep learning applications Vision Speech Text Other Search & information extraction Security/Video surveillance
More informationResidual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina
Residual Networks And Attention Models cs273b Recitation 11/11/2016 Anna Shcherbina Introduction to ResNets Introduced in 2015 by Microsoft Research Deep Residual Learning for Image Recognition (He, Zhang,
More informationDynamic Routing Between Capsules
Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationA Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition
A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition Théodore Bluche, Hermann Ney, Christopher Kermorvant SLSP 14, Grenoble October
More informationMachine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center
Machine Learning With Python Bin Chen Nov. 7, 2017 Research Computing Center Outline Introduction to Machine Learning (ML) Introduction to Neural Network (NN) Introduction to Deep Learning NN Introduction
More informationA Deep Relevance Matching Model for Ad-hoc Retrieval
A Deep Relevance Matching Model for Ad-hoc Retrieval Jiafeng Guo 1, Yixing Fan 1, Qingyao Ai 2, W. Bruce Croft 2 1 CAS Key Lab of Web Data Science and Technology, Institute of Computing Technology, Chinese
More informationDEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla
DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple
More informationGenerative Adversarial Text to Image Synthesis
Generative Adversarial Text to Image Synthesis Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee Presented by: Jingyao Zhan Contents Introduction Related Work Method
More informationPTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks
PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks Pramod Srinivasan CS591txt - Text Mining Seminar University of Illinois, Urbana-Champaign April 8, 2016 Pramod Srinivasan
More informationCIS581: Computer Vision and Computational Photography Project 4, Part B: Convolutional Neural Networks (CNNs) Due: Dec.11, 2017 at 11:59 pm
CIS581: Computer Vision and Computational Photography Project 4, Part B: Convolutional Neural Networks (CNNs) Due: Dec.11, 2017 at 11:59 pm Instructions CNNs is a team project. The maximum size of a team
More informationDeep Learning in NLP. Horacio Rodríguez. AHLT Deep Learning 2 1
Deep Learning in NLP Horacio Rodríguez AHLT Deep Learning 2 1 Outline Introduction Short review of Distributional Semantics, Semantic spaces, VSM, Embeddings Embedding of words Embedding of more complex
More informationLSTM: An Image Classification Model Based on Fashion-MNIST Dataset
LSTM: An Image Classification Model Based on Fashion-MNIST Dataset Kexin Zhang, Research School of Computer Science, Australian National University Kexin Zhang, U6342657@anu.edu.au Abstract. The application
More informationDeep Learning Cook Book
Deep Learning Cook Book Robert Haschke (CITEC) Overview Input Representation Output Layer + Cost Function Hidden Layer Units Initialization Regularization Input representation Choose an input representation
More informationAn Exploration of Computer Vision Techniques for Bird Species Classification
An Exploration of Computer Vision Techniques for Bird Species Classification Anne L. Alter, Karen M. Wang December 15, 2017 Abstract Bird classification, a fine-grained categorization task, is a complex
More informationReal-time Gesture Pattern Classification with IMU Data
Real-time Gesture Pattern Classification with IMU Data Alex Fu Stanford University Computer Science Department alexfu@stanford.edu Yangyang Yu Stanford University Electrical Engineering Department yyu10@stanford.edu
More informationSemantic text features from small world graphs
Semantic text features from small world graphs Jurij Leskovec 1 and John Shawe-Taylor 2 1 Carnegie Mellon University, USA. Jozef Stefan Institute, Slovenia. jure@cs.cmu.edu 2 University of Southampton,UK
More informationRecurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra
Recurrent Neural Networks Nand Kishore, Audrey Huang, Rohan Batra Roadmap Issues Motivation 1 Application 1: Sequence Level Training 2 Basic Structure 3 4 Variations 5 Application 3: Image Classification
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationDeep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity
Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity Abstract: This project aims at creating a benchmark for Deep Learning (DL) algorithms
More informationDeep Learning for Computer Vision II
IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L
More informationVISION & LANGUAGE From Captions to Visual Concepts and Back
VISION & LANGUAGE From Captions to Visual Concepts and Back Brady Fowler & Kerry Jones Tuesday, February 28th 2017 CS 6501-004 VICENTE Agenda Problem Domain Object Detection Language Generation Sentence
More informationDEEP LEARNING IN PYTHON. Introduction to deep learning
DEEP LEARNING IN PYTHON Introduction to deep learning Imagine you work for a bank You need to predict how many transactions each customer will make next year Example as seen by linear regression Age Bank
More informationDeep Neural Networks Applications in Handwriting Recognition
Deep Neural Networks Applications in Handwriting Recognition 2 Who am I? Théodore Bluche PhD defended at Université Paris-Sud last year Deep Neural Networks for Large Vocabulary Handwritten
More informationBayesian model ensembling using meta-trained recurrent neural networks
Bayesian model ensembling using meta-trained recurrent neural networks Luca Ambrogioni l.ambrogioni@donders.ru.nl Umut Güçlü u.guclu@donders.ru.nl Yağmur Güçlütürk y.gucluturk@donders.ru.nl Julia Berezutskaya
More informationNatural Language Processing Basics. Yingyu Liang University of Wisconsin-Madison
Natural Language Processing Basics Yingyu Liang University of Wisconsin-Madison Natural language Processing (NLP) The processing of the human languages by computers One of the oldest AI tasks One of the
More informationPouya Kousha Fall 2018 CSE 5194 Prof. DK Panda
Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing
More informationLayerwise Interweaving Convolutional LSTM
Layerwise Interweaving Convolutional LSTM Tiehang Duan and Sargur N. Srihari Department of Computer Science and Engineering The State University of New York at Buffalo Buffalo, NY 14260, United States
More informationDeepFace: Closing the Gap to Human-Level Performance in Face Verification
DeepFace: Closing the Gap to Human-Level Performance in Face Verification Report on the paper Artem Komarichev February 7, 2016 Outline New alignment technique New DNN architecture New large dataset with
More informationXES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework
XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework Demo Paper Joerg Evermann 1, Jana-Rebecca Rehse 2,3, and Peter Fettke 2,3 1 Memorial University of Newfoundland 2 German Research
More informationDialog System & Technology Challenge 6 Overview of Track 1 - End-to-End Goal-Oriented Dialog learning
Dialog System & Technology Challenge 6 Overview of Track 1 - End-to-End Goal-Oriented Dialog learning Julien Perez 1 and Y-Lan Boureau 2 and Antoine Bordes 2 1 Naver Labs Europe, Grenoble, France 2 Facebook
More informationA Simple (?) Exercise: Predicting the Next Word
CS11-747 Neural Networks for NLP A Simple (?) Exercise: Predicting the Next Word Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Are These Sentences OK? Jane went to the store. store to Jane
More informationImage Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction
Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction by Noh, Hyeonwoo, Paul Hongsuck Seo, and Bohyung Han.[1] Presented : Badri Patro 1 1 Computer Vision Reading
More informationMachine Learning. MGS Lecture 3: Deep Learning
Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer
More informationEfficient Algorithms may not be those we think
Efficient Algorithms may not be those we think Yann LeCun, Computational and Biological Learning Lab The Courant Institute of Mathematical Sciences New York University http://yann.lecun.com http://www.cs.nyu.edu/~yann
More informationShow, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks
Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Zelun Luo Department of Computer Science Stanford University zelunluo@stanford.edu Te-Lin Wu Department of
More informationStating the obvious, people and computers do not speak the same language.
3.4 SYSTEM SOFTWARE 3.4.3 TRANSLATION SOFTWARE INTRODUCTION Stating the obvious, people and computers do not speak the same language. People have to write programs in order to instruct a computer what
More informationNeural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders
Neural Networks for Machine Learning Lecture 15a From Principal Components Analysis to Autoencoders Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Principal Components
More informationCS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016
CS 1674: Intro to Computer Vision Neural Networks Prof. Adriana Kovashka University of Pittsburgh November 16, 2016 Announcements Please watch the videos I sent you, if you haven t yet (that s your reading)
More informationIndex. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,
Index A Algorithmic noise tolerance (ANT), 93 94 Application specific instruction set processors (ASIPs), 115 116 Approximate computing application level, 95 circuits-levels, 93 94 DAS and DVAS, 107 110
More informationDeep Neural Networks Applications in Handwriting Recognition
Deep Neural Networks Applications in Handwriting Recognition Théodore Bluche theodore.bluche@gmail.com São Paulo Meetup - 9 Mar. 2017 2 Who am I? Théodore Bluche PhD defended
More informationA Deep Learning primer
A Deep Learning primer Riccardo Zanella r.zanella@cineca.it SuperComputing Applications and Innovation Department 1/21 Table of Contents Deep Learning: a review Representation Learning methods DL Applications
More informationNovel Lossy Compression Algorithms with Stacked Autoencoders
Novel Lossy Compression Algorithms with Stacked Autoencoders Anand Atreya and Daniel O Shea {aatreya, djoshea}@stanford.edu 11 December 2009 1. Introduction 1.1. Lossy compression Lossy compression is
More informationA Hybrid Neural Model for Type Classification of Entity Mentions
A Hybrid Neural Model for Type Classification of Entity Mentions Motivation Types group entities to categories Entity types are important for various NLP tasks Our task: predict an entity mention s type
More informationArtificial Intelligence Introduction Handwriting Recognition Kadir Eren Unal ( ), Jakob Heyder ( )
Structure: 1. Introduction 2. Problem 3. Neural network approach a. Architecture b. Phases of CNN c. Results 4. HTM approach a. Architecture b. Setup c. Results 5. Conclusion 1.) Introduction Artificial
More informationMulti-Glance Attention Models For Image Classification
Multi-Glance Attention Models For Image Classification Chinmay Duvedi Stanford University Stanford, CA cduvedi@stanford.edu Pararth Shah Stanford University Stanford, CA pararth@stanford.edu Abstract We
More informationOutlier detection using autoencoders
Outlier detection using autoencoders August 19, 2016 Author: Olga Lyudchik Supervisors: Dr. Jean-Roch Vlimant Dr. Maurizio Pierini CERN Non Member State Summer Student Report 2016 Abstract Outlier detection
More informationFuzzy Set Theory in Computer Vision: Example 3, Part II
Fuzzy Set Theory in Computer Vision: Example 3, Part II Derek T. Anderson and James M. Keller FUZZ-IEEE, July 2017 Overview Resource; CS231n: Convolutional Neural Networks for Visual Recognition https://github.com/tuanavu/stanford-
More informationReport: Privacy-Preserving Classification on Deep Neural Network
Report: Privacy-Preserving Classification on Deep Neural Network Janno Veeorg Supervised by Helger Lipmaa and Raul Vicente Zafra May 25, 2017 1 Introduction In this report we consider following task: how
More information16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text
16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning Spring 2018 Lecture 14. Image to Text Input Output Classification tasks 4/1/18 CMU 16-785: Integrated Intelligence in Robotics
More informationRNNs as Directed Graphical Models
RNNs as Directed Graphical Models Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 10. Topics in Sequence Modeling Overview
More information